Method and Apparatus for Video Encoding Optimization

ABSTRACT

There is provided an encoder and a corresponding method for encoding video signal data corresponding to a plurality of pictures. The encoder includes an overlapping window analysis unit for performing a video analysis of the video signal data using a plurality of overlapping analysis windows with respect to at least some of the plurality of pictures corresponding to the video signal data, and for adapting encoding parameters for the video signal data based on a result of the video analysis.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser.No. 60/581,280, filed 18 Jun. 2004, which is incorporated by referenceherein in its entirety.

FIELD OF THE INVENTION

The present invention generally relates to video encoders and decodersand, more particularly, to a method and apparatus for video encodingoptimization.

BACKGROUND OF THE INVENTION

Multi-pass video encoding methods have been used in many video codingarchitectures such as MPEG-2 and JVT/H.264/MPEG AVC in order to achievebetter coding efficiency. The idea behind these methods is to try andencode the entire sequence using several iterations, while performing ananalysis and collecting statistics that could be used in futureiterations in an attempt to improve encoding performance.

Two pass encoding schemes have already been used in several encodingsystems, including the MICROSOFT® WINDOWS MEDIA® and REALVIDEO®encoders. According to such encoding schemes, the encoder first performsan initial encoding pass over the entire sequence using some initialpredefined settings, and collects statistics with regards to theencoding efficiency of each picture within the sequence. After thisprocess is completed, the entire sequence is reprocessed and coded onemore time, while at the same time taking into account the previouslygenerated statistics. This can considerably improve encoding efficiency,and even allow us to satisfy certain predefined encoding restrictions orrequirements, such as for example satisfying a given bitrate constraintfor the encoded stream. This is because the encoder is now more aware ofthe characteristics of the entire video sequence or picture, and thuscan more appropriately select the parameters, such as quantizers,deadzoning, and so forth, that will be used for encoding. Somestatistics that can be collected during this first encoding pass and canbe used for this purpose are the bits per picture, the spatial activity(i.e., the average normalized macroblock variance and mean), temporalactivity (i.e., the motion vectors/motion vector variance), distortion(e.g., Mean Square Error (MSE)), and so forth. Although encodingperformance can be considerably improved using these methods, these alsotend to be of very high complexity, can only be used offline (encode theentire sequence first and then perform a second pass), are not suitablefor real-time encoders, and do not always consider all possiblestatistics that could be inferred from the first encoding step.

SUMMARY OF THE INVENTION

These and other drawbacks and disadvantages of the prior art areaddressed by the present invention, which is directed to a method andapparatus for video encoding optimization.

According to an aspect of the present invention, there is provided anencoder for encoding video signal data corresponding to a plurality ofpictures. The encoder includes an overlapping window analysis unit forperforming a video analysis of the video signal data using a pluralityof overlapping analysis windows with respect to at least some of theplurality of pictures corresponding to the video signal data, and foradapting encoding parameters for the video signal data based on a resultof the video analysis.

According to another aspect of the present invention, there is provideda method for encoding video signal data corresponding to a plurality ofpictures. The method includes the steps of performing a video analysisof the video signal data using a plurality of overlapping analysiswindows with respect to at least some of the plurality of picturescorresponding to the video signal data, and adapting encoding parametersfor the video signal data based on a result of the video analysis.

These and other aspects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof exemplary embodiments, which is to be read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood in accordance with thefollowing exemplary figures, in which:

FIG. 1 shows a block diagram for an exemplary window based two-passencoding architecture in accordance with the principles of the presentinvention;

FIG. 2 shows a plot for an impact of deadzoning during transformationand quantization in accordance with the principles of the presentinvention;

FIG. 3 shows a block diagram for an encoder in accordance with theprinciples of the present invention; and

FIG. 4 shows a flow diagram for an exemplary encoding process inaccordance with the principles of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention is directed to a method and apparatus for videoencoding optimization. Advantageously, the present invention allows avideo encoder to compress video sequences at considerably improvedsubjective and objective quality given a specific bitrate. This isachieved through a non-causal processing of the video sequence, byperforming a simple analysis of the current picture compared to Nsubsequent pictures that have yet to be coded. The results of theanalysis can then be utilized by the encoder to make better decisionsabout the encoding parameters (including, but not limited to,picture/slice types, quantizers, thresholding parameters, Lagrangian λ,and so forth) that are to be used for the encoding of the currentpicture. Unlike several prior art systems that perform dual ormulti-pass encoding of the entire sequence to achieve better encodingperformance, the present invention is relatively simple and, thus, has arelatively small impact on complexity. The principles of the presentinvention may also be used in conjunction with other multi-pass encodingstrategies to achieve even higher efficiency. In similar fashion, acausal system (using the M previously coded pictures) can also becreated

In accordance with the principles of the present invention, only asubset overlapping picture window of the entire sequence is firstanalyzed. Based upon the generated statistics, the encoding parametersfor each picture are appropriately adjusted. These encoding parametersmay include, but are not limited to, picture/slice type decision (I, P,B), frame/field decision, B picture distance, picture or MB Quantizationvalues (QP), coefficient thresholding, lagrangian parameters, chromaoffsetting, weighted prediction, reference picture selection, multipleblock size decision, entropy parameter initialization, intra modedecision, deblocking filter parameters, and so forth. Analysis methodsthat may require different complexity costs could be used for performingthe picture/macroblock analysis, including full first pass encoding, asimple first pass motion estimation with spatial analysis, or evensimple temporal and spatial analysis metrics including, but not limitedto, variance, image difference, and so forth. Furthermore, theoverlapping picture window (and the overlap pictures) could be as largeor as small (as many or as few) as necessary, thus providing differentdelay/performance tradeoffs.

The present description illustrates the principles of the presentinvention. It will thus be appreciated that those skilled in the artwill be able to devise various arrangements that, although notexplicitly described or shown herein, embody the principles of theinvention and are included within its spirit and scope.

All examples and conditional language recited herein are intended forpedagogical purposes to aid the reader in understanding the principlesof the invention and the concepts contributed by the inventor tofurthering the art, and are to be construed as being without limitationto such specifically recited examples and conditions.

Moreover, all statements herein reciting principles, aspects, andembodiments of the invention, as well as specific examples thereof, areintended to encompass both structural and functional equivalentsthereof. Additionally, it is intended that such equivalents include bothcurrently known equivalents as well as equivalents developed in thefuture, i.e., any elements developed that perform the same function,regardless of structure.

Thus, for example, it will be appreciated by those skilled in the artthat the block diagrams presented herein represent conceptual views ofillustrative circuitry embodying the principles of the invention.Similarly, it will be appreciated that any flow charts, flow diagrams,state transition diagrams, pseudocode, and the like represent variousprocesses which may be substantially represented in computer readablemedia and so executed by a computer or processor, whether or not suchcomputer or processor is explicitly shown.

The functions of the various elements shown in the figures may beprovided through the use of dedicated hardware as well as hardwarecapable of executing software in association with appropriate software.When provided by a processor, the functions may be provided by a singlededicated processor, by a single shared processor, or by a plurality ofindividual processors, some of which may be shared. Moreover, explicituse of the term “processor” or “controller” should not be construed torefer exclusively to hardware capable of executing software, and mayimplicitly include, without limitation, digital signal processor (“DSP”)hardware, read-only memory (“ROM”) for storing software, random accessmemory (“RAM”), and non-volatile storage.

Other hardware, conventional and/or custom, may also be included.Similarly, any switches shown in the figures are conceptual only. Theirfunction may be carried out through the operation of program logic,through dedicated logic, through the interaction of program control anddedicated logic, or even manually, the particular technique beingselectable by the implementer as more specifically understood from thecontext.

In the claims hereof, any element expressed as a means for performing aspecified function is intended to encompass any way of performing thatfunction including, for example, a) a combination of circuit elementsthat performs that function or b) software in any form, including,therefore, firmware, microcode or the like, combined with appropriatecircuitry for executing that software to perform the function. Theinvention as defined by such claims resides in the fact that thefunctionalities provided by the various recited means are combined andbrought together in the manner which the claims call for. Applicant thusregards any means that can provide those functionalities as equivalentto those shown herein.

In accordance with the principles of the present invention, a newmulti-pass encoding architecture is disclosed which, unlike previousmethods that consider either the entire video sequence or independentwindows during each pass, performs each pass on overlapping windowswhich allows previously determined characteristics to be reused betweenadjacent windows. This architecture can still achieve the benefits ofmulti-pass encoding, such as significantly enhanced video quality,albeit at a lower cost/complexity and with smaller memoryrequirements/low latency since the optimal encoding can be achievedusing far fewer steps. This feature is especially important in real timeencoding applications, considering that due to similarities betweenadjacent windows, it is possible for the encoder to decide the bestparameters even during the first pass, thus requiring no furtheriterations for the final encoding.

Turning to FIG. 1, a window based two-pass encoding architecture isindicated generally by the reference numeral 100. Theprocessing/analysis window is of size W_(p) pictures, while the overlapallowed between two adjacent groups is of size W_(o). Processing of thefirst window would provide some initial statistics that could be used todetermine a preliminary set of coding characteristics for all frameswithin this window. More specifically, if a two-pass scheme is used,then all frames that do not also belong in the future window can beimmediately coded based on the generated parameters. Nevertheless, thisinformation can be immediately used for the processing/analysis of thisfuture window. For example, these parameters can be used as initialseeds during the processing of this window and, considering the hightemporal correlation that exists in most sequences, can improve theanalysis. More importantly, the encoding parameters used for the initialframes of this window, which also belong in the previous window due tothe selection of W_(o), can be further refined/conditioned based on thenew generated statistics. This basically allows for a faster convergenceto the optimal solution if a larger number of iterations/passes is used,e.g., after processing the entire sequence or M number of adjacentwindows. It is obvious that the temporal window can be as large or assmall as possible, depending on the capabilities or requirements of theencoder, while also iterations of this scheme could be performed usingdifferent window sizes (larger or smaller W_(o) and W_(p)).

Many different criteria could be used during the pre-analysis step ofour multi-pass scheme. Such criteria could depend on the complexityconstraints of the encoder architecture and could consider from simplespatio-temporal methods (including, but not limited to, edge detection,texture analysis metrics, and absolute image difference) to more complexstrategies (including, but not limited to, Discrete Cosine Transfer(DCT) analysis, first pass intra coding, motion estimation/compensation,and even full encoding). Latency can also be adjusted by increasing ordecreasing the analysis and/or the overlapping windows.

As an example of such a system, during this analysis the followingcriteria can be computed:

For every picture k within window W_(p), the following is computed:

-   (i) For each Macroblock at position (ij), the mean value    MBmean(k,ij), computed as: $\begin{matrix}    {{{MBmean}\left( {k,i,j} \right)} = {\frac{1}{B_{W} \times B_{H}}{\sum\limits_{{y = 0},{x = 0}}^{{y = {B_{H} - 1}},{x = {B_{W} - 1}}}\quad{c\left\lbrack {k,{{i \times B_{w}} +}} \right.}}}} \\    \left. {x,{{j \times B_{H}} + y}} \right\rbrack \\    \quad    \end{matrix}$-   (ii) the mean square value MBsqmean(k,ij), computed as:    $\begin{matrix}    {{{MBsqmean}\left( {k,i,j} \right)} = {\frac{1}{B_{W} \times B_{H}}{\sum\limits_{{y = 0},{x = 0}}^{{y = {B_{H} - 1}},{x = {B_{W} - 1}}}\quad\left( {c\left\lbrack {k,{{i \times B_{W}} +}} \right.} \right.}}} \\    \left. \left. {x,{{j \times B_{H}} + y}} \right\rbrack \right)^{2} \\    \quad    \end{matrix}$-   (iii) the variance value MBvariance(k,ij), computed as:    MBvariance(k,ij)=MBsqmean(k,ij)−(MBmean(k,ij))²-   (iv) and for the entire picture, the Average Macroblock Mean value    AMM_(k), computed as:    ${AMM}_{k} = {\frac{1}{{PMB}_{W} \times {PMB}_{H}}{\sum\limits_{{j = 0},{i = 0}}^{{j = {{PMB}_{H} - 1}},{i = {{PMB}_{W} - 1}}}{{MBmean}\left( {k,i,j} \right)}}}$-   (v) the Average Macroblock Variance AMV_(k), computed as:    ${AMV}_{k} = {\frac{1}{{PMB}_{W} \times {PMB}_{H}}{\sum\limits_{{j = 0},{i = 0}}^{{j = {{PMB}_{H} - 1}},{i = {{PMB}_{W} - 1}}}{{MBvariance}\left( {k,i,j} \right)}}}$-   (vi) and the Picture Variance PV_(k), computed as: $\begin{matrix}    {{PV}_{k} = {\frac{1}{{PMB}_{W} \times {PMB}_{H}}\sum\limits_{{j = 0},{i = 0}}^{{j = {{PMB}_{H} - 1}},{i = {{PMB}_{W} - 1}}}}} \\    {{{MBsqmean}\left( {k,i,j} \right)} - {AMM}_{k}^{2}} \\    \quad    \end{matrix}$    where c[x,y] corresponds to the pixel value at position (x,y),    PMB_(W) and PMB_(H) are the picture's width and height in    macroblocks respectively, and B_(W) and B_(H) are the width and    height of each macroblock in the current picture (usually    B_(W)=B_(W)=16).

Furthermore, the following temporal characteristics versus picture m(e.g m=k+1) are also computed as follows:

-   (I) the mean absolute picture difference MAPD_(k,m), computed as:    $\begin{matrix}    {{MAPD}_{k,m} = \frac{1}{{PMB}_{W} \times {PMB}_{H} \times B_{W} \times B_{H}}} \\    {\sum\limits_{{y = 0},{x = 0}}^{{y = {{{PMB}_{H} \times B_{H}} - 1}},{x = {{{PMB}_{W} \times B_{W}} - 1}}}{{{c\left\lbrack {k,x,y} \right\rbrack} - {c\left\lbrack {m,x,y} \right\rbrack}}}} \\    \quad    \end{matrix}$-   (II) the mean absolute weighted picture difference MAWPD_(k,m),    computed as: $\begin{matrix}    {{MAWPD}_{k,m} = \frac{1}{{PMB}_{W} \times {PMB}_{H} \times B_{W} \times B_{H}}} \\    {\sum\limits_{{y = 0},{x = 0}}^{{y = {{{PMB}_{H} \times B_{H}} - 1}},{x = {{{PMB}_{W} \times B_{W}} - 1}}}} \\    {{{c\left\lbrack {k,x,y} \right\rbrack} - {\frac{{AMM}_{k}}{{AMM}_{m}}{c\left\lbrack {m,x,y} \right\rbrack}}}}    \end{matrix}$-   (III) the mean absolute offset picture difference MAWPD_(k,m),    computed as: $\begin{matrix}    {{MAWPD}_{k,m} = \frac{1}{{PMB}_{W} \times {PMB}_{H} \times B_{W} \times B_{H}}} \\    {\sum\limits_{{y = 0},{x = 0}}^{{y = {{{PMB}_{H} \times B_{H}} - 1}},{x = {{{PMB}_{W} \times B_{W}} - 1}}}{{{c\left\lbrack {k,x,y} \right\rbrack} - {c\left\lbrack {m,x,y} \right\rbrack} +}}} \\    {{{AMM}_{k} - {AMM}_{m}}}    \end{matrix}$-   (IV) the mean square picture error MSPE_(k,m), computed as:    $\begin{matrix}    {{MSPE}_{k,m} = \frac{1}{{PMB}_{W} \times {PMB}_{H} \times B_{W} \times B_{H}}} \\    {\sum\limits_{{y = 0},{x = 0}}^{{y = {{{PMB}_{H} \times B_{H}} - 1}},{x = {{{PMB}_{W} \times B_{W}} - 1}}}\left( {{c\left\lbrack {k,x,y} \right\rbrack} - {c\left\lbrack {m,x,y} \right\rbrack}} \right)^{2}} \\    \quad    \end{matrix}$-   (V) and the absolute picture variance difference APVD_(k,m),    computed as:    APVD _(k,m) =|PV _(k) −PV _(m)|

Other spatio-temporal characteristics that can be computed are absolutedifference of histograms, histogram of absolute differences, χ² metricsbetween k and M, edges of k using any (or even multiple) edge operators(including, but not limited to, canny, sobel, or prewitt edgeoperators), or even field based metrics for the detection of interlacecharacteristics of a sequence. Two other statistical information thatcould be useful and could be inferred from the above, are distances ofthe current picture from the closest past (last_idistance_(k)) andclosest future (next_idistance_(k)) coded intra pictures, as measuredby, e.g., picture number, coding order, or picture order count (poc).These statistics could be enhanced through the consideration of a scenechange/shot detector and/or the default Group of Pictures (GOP)structure. Temporal characteristics could be computed using original orreconstructed images (e.g., if the present invention is applied in amulti-pass implementation), while also the computation of these metricscould also consider motion estimation/compensation.

Based on the above metrics, the encoder may decide to modify certainpicture, macroblock, or even sub-block parameters related to theencoding process. These include parameters such as quantization values(QP), coefficient deadzoning/thresholding, lagrangian value formacroblock encoding and also picture level decisions between frames andfields, deblocking filter parameters, coding and reference pictureordering, scene/shot (including, but not limited to,fade/dissove/wipe/flash, and so forth) detection, GOP structure, and soforth.

In one illustrative embodiment of the present invention, the aboveparameters are considered as follows to perform picture QP adaptationwhen coding picture k of slice type cur_slice_type_(k). In thisembodiment, distance_(k,k+1) is considered as the distance between twoadjacent pictures in terms of picture numbers: if (next_idistance_(k) >3 && cur_slice_type_(k) == I_Slice) {  if (PV_(k)<1 && MAPD_(k,k+1)<1 &&last_idistance_(k) > 5*distance_(k,k+1))   QP_(k) = QP_(k)−4  else if(MAPD_(k,k+1)<3 && (k==0 || last_idistance_(k) > 5*distance_(k,k+1)))  QP_(k) = QP_(k)−3  else if (MAPD_(k,k+1)<10)   QP_(k) = QP_(k)−2  elseif (MAPD_(k,k+1)<15)   QP_(k) = QP_(k)−1 } else if (AMV_(k)>10 &&AMV_(k)<60) {  if (PV_(k)<500 && next_idistance_(k) >3*distance_(k,k+1))  {   if (MAPD_(k,k+1)<10 && AMV_(k)<35 &&last_idistance_(k) >   2*distance_(k,k+1))    QP_(k) = QP_(k)−2   else   QP_(k) = QP_(k)−1 } else if (PV_(k)<1500 && next_idistance_(k) > 0) {  if (MAPD_(k,k+1)<25)    QP_(k) = QP_(k)−1  } } else if(MAPD_(k,k+1)==0 && next_idistance_(k) > 3*distance_(k,k+1) &&last_idistance_(k) >4*distance_(k,k+1))  QP_(k) = QP_(k)−2 else(((MAPD_(k,k+1)<2 && next_idistance_(k) > 3*distance_(k,k+1) &&last_idistance_(k) >2*distance_(k,k+1))   || last_idistance_(k) >30) &&next_idistance_(k) > 5) {  if (MAPD_(k,k+1)<1)    QP_(k) = QP_(k)−3 else if (MAPD_(k,k+1)<4)    QP_(k) = QP_(k)−2  else if(MAPD_(k,k+1)<10)    QP_(k) = QP_(k)−1 }

In the above embodiment, no consideration was directed at whether theprevious or a nearby past picture has already updated its QP due to theabove rules. This could result in updating QP values more thannecessary, which may be undesirable in terms of Rate-distortion (RD)performance. For this purpose, the parameter last_idistance_(k) isupdated to be equal to the value of the last QP adjusted pictureregardless of its picture type.

Similarly macroblock/block variance, mean, and edge statistics may beused to determine local encoding parameters. For example, for theselection of a macroblock at position (ij) lagrangian lambda A thefollowing rules can be considered: if (cur_slice_type_(k) != B_Slice) {if (contains_edges(k,i,j))$\lambda = {0.5 \times 2^{\frac{{QP} - 12}{3}}}$ else if(cur_slice_type_(k) == I_Slice) { if (MBvariance(k,i,j)<15 ||MBvariance(k,i,j)>60) $\lambda = {0.58 \times 2^{\frac{{QP} - 12}{3}}}$else if (MBvariance(k,i,j)>=15 && MBvariance(k,i,j)<=40)$\lambda = {0.65 \times 2^{\frac{{QP} - 12}{3}}}$ else$\lambda = {0.60 \times 2^{\frac{{QP} - 12}{3}}}$ } else //cur_slice_type_(k) == P_Slice { if (MBvariance(k,i,j)<15 ||MBvariance(k,i,j)>60) $\lambda = {0.60 \times 2^{\frac{{QP} - 12}{3}}}$else if (MBvariance(k,i,j)>15 && MBvariance(k,i,j)<=40)$\lambda = {0.70 \times 2^{\frac{{QP} - 12}{3}}}$ else$\lambda = {0.65 \times 2^{\frac{{QP} - 12}{3}}}$ } } else {bscale=max(2.00,min(4.00,(QP / 6.0))); if (contains_edges(k,i,j))$\lambda = {0.65 \times {bscale} \times 2^{\frac{{QP} - 12}{3}}}$ else {if (MBvariance(k,i,j)<15 || MBvariance(k,i,j)>60)$\lambda = {0.68 \times {bscale} \times 2^{\frac{{QP} - 12}{3}}}$ elseif (MBvariance(k,i,j)>15 && MBvariance(k,i,j)<=40)$\lambda = {0.72 \times {bscale} \times 2^{\frac{{QP} - 12}{3}}}$ else$\lambda = {0.70 \times 2^{\frac{{QP} - 12}{3}}}$ } if(nal_reference_idc == 1) λ = 0.80 × λ }

Similar decisions can be made for the selection of the quantizationvalues or coefficient thresholding that are used for the residualencoding. More specifically quantization of a coefficient W in H.264 isperformed as follows:Z=int({|W|+f×(1<<q_bits)}>>qbits)·sgn(W)where Z is the final quantized value, while q_bits is based on thecurrent macroblock's quantizer QP. The term f×(1<<q_bits) serves as arounding term for the quantization process, which “optimally” should beequal to ½×(1<<q_bits). Turning now to FIG. 2, an impact of deadzoningduring transformation and quantization is indicated generally by thereference numeral 200. In FIG. 2, the interval around zero is called adead zone. A deadzone quantizer is characterized by two parameters: thezero bin-width (2 s-2 f) and the outbin width (s), as shown in FIG. 2.The optimization of the deadzone through f is often used as an efficientmethod to achieve good rate-distortion performance. Nevertheless, it iswell known that the introduction of a deadzone during this process (i.e.reduction of the f term) can usually allow an additional bitratereduction, while having a small impact in quality. This is especiallytrue for lower resolution content which lack the details (and the filmgrain information) of higher resolution material. Although f=½ could beused, this could also have a rather significant increase in bitrate andhurt performance in terms of RD evaluation.

Considering that different frequencies are more important than others,an alternative approach would be to take this observation into accountin order to improve performance. Instead of using a fixed f value on alltransform coefficients, different values are considered, essentially ina matrix approach, where each deadzone parameter is selected based onfrequency position. Therefore, Z can now be computed as follows:Z=int({|W|+f(i, j)×(1<<q_bits)}>>qbits)·sgn(W)where i and j correspond to the current column or row within the blocktransform coefficients. The array f can now depend on slice ormacroblock type, and also on the texture characteristics (variance oredge information) of the current block. If a block, for example,contains edges, or has low variance characteristics, it is important notto introduce further artifacts due to the deadzoning process since thesewould be more visible. On the other hand, blocks with high spatialactivity can mask more artifacts, and deadzoning could be increasedwithout a significant impact in quality. Deadzoning could also bechanged depending on whether the current block provides any usefulinformation for blocks in a future picture (i.e., if any pixel withinthe current block is used or is not used for predicting other pixels).

As an example, the following deadzoning matrices could be used if a 4×4transform is used: if (cur_slice_type_(k) == I_Slice) { if(MBvariance(k,i,j)<15 || MBvariance(k,i,j)>60) $f = \begin{bmatrix}{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}3} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}3} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}3} & {1\text{/}4} \\{1\text{/}3} & {1\text{/}3} & {1\text{/}4} & {1\text{/}5}\end{bmatrix}$ else if (MBvariance(k,i,j) >=15 &&MBvariance(k,i,j)<=40|| contains_edges(k,i,j)) $f = \begin{bmatrix}{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}2} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}2} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}2} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}2}\end{bmatrix}$ else $f = \begin{bmatrix}{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}2} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}2} & {1\text{/}3} \\{1\text{/}2} & {1\text{/}2} & {1\text{/}3} & {1\text{/}4} \\{1\text{/}2} & {1\text{/}3} & {1\text{/}4} & {1\text{/}5}\end{bmatrix}$ } else if (cur_slice_type_(k) P_Slice) { if(MBvariance(k,i,j)<15 || MBvariance(k,i,j)>60) $f = \begin{bmatrix}{1\text{/}3} & {2\text{/}7} & {4\text{/}15} & {2\text{/}9} \\{2\text{/}7} & {4\text{/}15} & {2\text{/}9} & {1\text{/}6} \\{4\text{/}15} & {2\text{/}9} & {1\text{/}6} & {1\text{/}7} \\{2\text{/}9} & {1\text{/}6} & {1\text{/}7} & {2\text{/}15}\end{bmatrix}$ else if (MBvariance(k,i,j) >15&&MBvariance(k,i,j) <40 ||contains_edges(k,i,j)) $f = \begin{bmatrix}{1\text{/}2} & {1\text{/}3} & {2\text{/}7} & {2\text{/}9} \\{1\text{/}3} & {4\text{/}15} & {2\text{/}9} & {1\text{/}6} \\{2\text{/}7} & {2\text{/}8} & {1\text{/}6} & {1\text{/}7} \\{2\text{/}9} & {1\text{/}6} & {1\text{/}7} & {2\text{/}15}\end{bmatrix}$ else $f = \begin{bmatrix}{2\text{/}5} & {1\text{/}3} & {4\text{/}15} & {2\text{/}9} \\{1\text{/}3} & {4\text{/}15} & {2\text{/}9} & {1\text{/}6} \\{4\text{/}15} & {2\text{/}9} & {1\text{/}6} & {1\text{/}7} \\{2\text{/}9} & {1\text{/}6} & {1\text{/}7} & {2\text{/}15}\end{bmatrix}$ } else // B_slices { $f = \begin{bmatrix}{1\text{/}4} & {1\text{/}6} & {1\text{/}6} & {1\text{/}6} \\{1\text{/}6} & {1\text{/}6} & {1\text{/}6} & {1\text{/}7} \\{1\text{/}6} & {1\text{/}6} & {1\text{/}7} & {1\text{/}7} \\{1\text{/}6} & {1\text{/}7} & {1\text{/}7} & {1\text{/}7}\end{bmatrix}$ }

Under certain conditions, it might be impossible for the encoder toperform temporal analysis using future frames. In this case, temporalanalysis could be performed while considering only previously codedpictures, and by assuming that future pictures have similar temporalcharacteristics. For example, if the current picture has high similarity(e.g., MAPD_(k,k−1) is small), then it is assumed that also thesimilarity with the next picture to be coded (MAPD_(k,k+1)) would alsobe small. Thus, adaptation of the encoding parameters could be based onalready available information, while replacing all indices (k,k+1) with(k,k−1).

Turning now to FIG. 3, a video encoder is indicated generally by thereference numeral 300. An input of the video encoder 300 is connected insignal communication with an input of a pre-analysis block 310. Thepre-analysis block 310 includes a plurality of frame delays 312connected in signal communication to each other such that each of theplurality of frame delays 312 is connected sequentially in serial andall in parallel, the latter via a parallel signal path. The parallelsignal path is also connected in signal communication with an input of atemporal analyzer 315. An output of the last frame delay 312 connectedin serial and farthest away from the input of the encoder 300 isconnected in signal communication with an input of a spatial analyzer320, with an inverting input of a first summing junction 325, with afirst input of a motion compensator 375 and with a first input of amotion estimator/mode decision block 370. An output of the first summingjunction 325 is connected in signal communication with an input of atransformer 330. An output of the transformer 330 is connected in signalcommunication with a first input of a quantizer 335. An output of thequantizer 335 is connected in signal communication with a first input ofa variable length coder 340 and with an input of an inverse quantizer345. An output of the variable length coder 340 is an externallyavailable output of the video encoder 300. An output of the inversequantizer 345 is connected in signal communication with an input of aninverse transformer 350. An output of the inverse transformer isconnected in signal communication with a non-inverting first input of asecond summing junction 355. An output of the second summing junction355 is connected in signal communication with a first input of a loopfilter 360. An output of the loop filter 360 is connected in signalcommunication with a first input of a picture reference store 365. Anoutput of the picture reference store 365 is connected in signalcommunication with a second input of the motion estimator/mode decisionblock 370 and with a second input of the motion compensator 375. A firstoutput of the motion estimator/mode decision block 370 is connected insignal communication with a second input of the variable length coder340. A second output of the motion estimator/mode decision block 370 isconnected in signal communication with a third input of the motioncompensator 375. An output of the motion compensator 375 is connected insignal communication with a non-inverting input of the first summingjunction 325, and with a non-inverting second input of the secondsumming junction 355. A first output of the spatial analyzer 320 isconnected in signal communication with a second input of the quantizer335. A second output of the spatial analyzer 320 is connected in signalcommunication with a second input of the loop filter 360, with a thirdinput of the motion estimator/mode decision block 370, and with thenon-inverting input of the first summing junction 325. A first output ofthe temporal analyzer 315 is connected in signal communication with thesecond input of the quantizer 335. A second output of the temporalanalyzer 315 is connected in signal communication with a fourth input ofthe motion estimator/mode decision block 370. A third output of thetemporal analyzer 315 is connected in signal communication with a thirdinput of the loop filter 360 and with a second input of the picturereference store 365.

A group of pictures is considered during a temporal analysis step, whichdecides several parameters, including slice type decision, GOPstructure, weighting parameters (through the motion estimator/modedecision block 370), quantization values and deadzoning (through thequantizer 335), reference order and handling (picture reference store365), picture coding ordering, frame/field picture level adaptivedecision, and even deblocking parameters (loop filter 360). Similarly,spatial analysis is performed on each coded frame, which can similarlyimpact quantization and deadzoning (quantizer 335), lagrangianparameters and slice type decision (Motion Estimation/Mode Decisionblock 370), inter/intra mode decision, frame/field picture level andmacroblock level adaptive decision and deblocking (loop filter 360).

Turning now to FIG. 4, an exemplary process for encoding video signaldata is indicated generally by the reference numeral 400. The processcan analyze or encode the same bitstream multiple times while collectingand updating the required statistics in each iteration. These statisticsare used in each subsequent pass to improve the encoding performance byadapting the encoder parameters given the video characteristics or userrequirements. In particular, k frames (i.e., excluding non-storedpictures) are to be encoded, with L number of passes (also referred toherein as “repetitions” and “iterations”) and a window of size (N,M)where N is the total number of frames within the window and M is thenumber of overlapping frames between adjacent windows. The frame that isto be encoded is indexed using the variable frm, while the currentposition within a window is indexed using the variable w_(index).

The process includes a begin block 405 that passes control to a functionblock 410. The function block 410 sets the sequence size to k, sets thenumber of repetitions to L, sets a variable i to zero (0), and passescontrol to a function block 415. The function block 415 sets the windowsize to N, sets the overlap size to M, sets the variable frm to zero(0), and passes control to a function block 420. The function block 420sets the variable w_(index) to zero (0), and passes control to afunction block 425. Thus, it is to be appreciated that for each encodingpass, the window parameters are initialized. This allows the use ofdifferent window sizes or even to adapt them based on previous analysissteps (e.g., if a scene change was detected, then N and M could beadjusted accordingly to include only a complete scene).

The function block 425 performs temporal analysis for each window to beprocessed while considering all N frames within the window, generatestemporal statistics (tstat_(i,frm . . . frm+N−1)), and optionally adaptsor refines statistics from previous passes or encoding steps using thecurrent statistics. The function block 425 then passes control to afunction block 430. The function block 430 performs spatial analysis forthe frame with index frm (w_(index) within the current window) until thecondition w_(index)<N-M is no longer satisfied, and passes control to afunction block 435. The function block 435 encodes these frames based onthe results from the temporal and spatial analysis, generates/collectsencoder statistics that can be used if multiple passes are required, andpasses control to a function block 440.

Function block 440 increments the values of variables frm and w_(index),and passes control to a decision block 445, The decision block 445determines whether or not the variable frm is less than k.

If the variable frm is less than k, then control passes to a decisionblock 450 that determines whether or not w_(index) is less than (N-M).Otherwise, if the variable frm is not less than k, then control passesto a decision block 455 that determines whether or not i is less than L.

If w_(index) is less than (N-M), then control is passed back to functionblock 430. Otherwise, if w_(index) is not less than (N-M), then controlis passed back to function block 420.

If i is not less than L, then control is passed back to function block415. Otherwise, i is less than L, then control is passed to an end block460.

A description will now be given of some of the many attendantadvantages/features of the present invention, according to variousillustrative embodiments of the present invention. For example, oneadvantage/feature is the providing of an encoding apparatus and methodthat performs video analysis based on constrained but overlappingwindows of the content to be coded, and uses this information to adaptencoding parameters. Another advantage/feature is the use ofspatio-temporal analysis in the video analysis. Yet anotheradvantage/feature is that a preliminary encoding pass is considered forthe video analysis. Moreover, another advantage/feature is thatspatio-temporal analysis and a preliminary encoding pass are jointlyconsidered in the video analysis. Also, another advantage/feature isthat at least one of picture coding type, edge, mean, and varianceinformation is used for spatial analysis, and adaptation of lagrangianparameters, quantization and deadzoning. Still another advantage/featureis that absolute difference and variance are used to adapt quantizationparameters. Additionally, another advantage/feature is that theperformed video analysis only considers previously coded pictures.Further, another advantage/feature is that the performed video analysisis used to decide at least one of several encoding parameters including,but not limited to, slice type decision, GOP and picture codingstructure and order, weighting parameters, quantization values anddeadzoning, lagrangian parameters, number of references, reference orderand handling, frame/field picture and macroblock decisions, deblockingparameters, inter block size decision, intra spatial prediction, anddirect modes. Also, another advantage/feature is that the video analysiscan be performed using multiple iterations, while considering previouslygenerated statistics to adapt the encoding parameters or the analysisstatistics. Moreover, another advantage/feature is that window sizes andoverlapping window regions are adaptable based on previously generatedanalysis statistics.

These and other features and advantages of the present invention may bereadily ascertained by one of ordinary skill in the pertinent art basedon the teachings herein. It is to be understood that the teachings ofthe present invention may be implemented in various forms of hardware,software, firmware, special purpose processors, or combinations thereof.

Most preferably, the teachings of the present invention are implementedas a combination of hardware and software. Moreover, the software ispreferably implemented as an application program tangibly embodied on aprogram storage unit. The application program may be uploaded to, andexecuted by, a machine comprising any suitable architecture. Preferably,the machine is implemented on a computer platform having hardware suchas one or more central processing units (“CPU”), a random access memory(“RAM”), and input/output (“I/O”) interfaces. The computer platform mayalso include an operating system and microinstruction code. The variousprocesses and functions described herein may be either part of themicroinstruction code or part of the application program, or anycombination thereof, which may be executed by a CPU. In addition,various other peripheral units may be connected to the computer platformsuch as an additional data storage unit and a printing unit.

It is to be further understood that, because some of the constituentsystem components and methods depicted in the accompanying drawings arepreferably implemented in software, the actual connections between thesystem components or the process function blocks may differ dependingupon the manner in which the present invention is programmed. Given theteachings herein, one of ordinary skill in the pertinent art will beable to contemplate these and similar implementations or configurationsof the present invention.

Although the illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious changes and modifications may be effected therein by one ofordinary skill in the pertinent art without departing from the scope orspirit of the present invention. All such changes and modifications areintended to be included within the scope of the present invention as setforth in the appended claims.

1. An encoder for encoding video signal data corresponding to aplurality of pictures, the encoder comprising an overlapping windowanalysis unit for performing a video analysis of the video signal datausing a plurality of overlapping analysis windows with respect to atleast some of the plurality of pictures corresponding to the videosignal data, and for adapting encoding parameters for the video signaldata based on a result of the video analysis.
 2. The encoder as definedin claim 1, wherein said overlapping windows analysis unit performs thevideo analysis of the video signal data using spatio-temporal analysis.3. The encoder as defined in claim 2, wherein said overlapping windowsanalysis unit uses at least one of picture coding type information, edgeinformation, mean information, and variance information for at least oneof the spatio-temporal analysis, and for adaptation of lagrangianparameters and quantization parameters and deadzoning.
 4. The encoder asdefined in claim 3, wherein said overlapping windows analysis unitadapts the quantization parameters using absolute difference andvariance.
 5. The encoder as defined in claim 1, wherein said overlappingwindows analysis unit performs the video analysis of the video signaldata using a preliminary encoding pass.
 6. The encoder as defined inclaim 1, wherein said overlapping windows analysis unit performs thevideo analysis of the video signal data using both spatio-temporalanalysis and a preliminary encoding pass.
 7. The encoder as defined inclaim 6, wherein said overlapping windows analysis unit uses at leastone of picture coding type information, edge information, meaninformation, and variance information for at least one of thespatio-temporal analysis, for adaptation of lagrangian parameters andquantization parameters, and for deadzoning.
 8. The encoder as definedin claim 7, wherein said overlapping windows analysis unit adapts thequantization parameters using absolute difference and variance.
 9. Theencoder as defined in claim 1, wherein the video signal data comprises aplurality of frames, each of the plurality of frames representing acorresponding picture, and said overlapping analysis unit performs thevideo analysis so as to consider only previously coded pictures.
 10. Theencoder as defined in claim 1, wherein the encoding parameters compriseat least one of slice type, picture and Group of Pictures (GOP) codingstructure and order, weighting parameters, quantization values anddeadzoning, lagrangian parameters, a number of references, referenceorder and handling, frame/field picture and macroblock parameters,deblocking parameters, inter block size, intra spatial prediction, anddirect modes.
 11. The encoder as defined in claim 1, wherein saidoverlapping windows analysis unit performs the video analysis overmultiple iterations, and adapts one of the encoding parameters andanalysis statistics based on the previously generated analysisstatistics.
 12. The encoder as defined in claim 1, wherein each of theoverlapping windows has a window size of P pictures and an overlap sizeassociated therewith, and said overlapping windows analysis unit adaptsthe window size and the overlap size based on previously generatedanalysis statistics.
 13. A method for encoding video signal datacorresponding to a plurality of pictures, comprising the steps of:performing a video analysis of the video signal data using a pluralityof overlapping analysis windows with respect to at least some of theplurality of pictures corresponding to the video signal data; andadapting encoding parameters for the video signal data based on a resultof the video analysis.
 14. The method as defined in claim 13, whereinsaid performing step performs the video analysis of the video signaldata using spatio-temporal analysis.
 15. The method as defined in claim14, wherein said performing and adapting steps respectively use at leastone of picture coding type information, edge information, meaninformation, and variance information for at least one of thespatio-temporal analysis, and for adaptation of lagrangian parametersand quantization parameters and deadzoning.
 16. The method as defined inclaim 15, wherein the quantization parameters are adapted using absolutedifference and variance.
 17. The method as defined in claim 13, whereinsaid performing step performs the video analysis of the video signaldata using a preliminary encoding pass.
 18. The method as defined inclaim 13, wherein said performing step performs the video analysis ofthe video signal data using both spatio-temporal analysis and apreliminary encoding pass.
 19. The method as defined in claim 18,wherein said performing and adapting steps respectively use at least oneof picture coding type information, edge information, mean information,and variance information for at least one of the spatio-temporalanalysis, for adaptation of lagrangian parameters and quantizationparameters, and for deadzoning.
 20. The method as defined in claim 19,wherein the quantization parameters are adapted using absolutedifference and variance.
 21. The method as defined in claim 13, whereinthe video signal data comprises a plurality of frames, each of theplurality of frames representing a corresponding picture, and saidperforming step performs the video analysis so as to consider onlypreviously coded pictures.
 22. The method as defined in claim 13,wherein the encoding parameters comprise at least one of slice type,picture and Group of Pictures (GOP) coding structure and order,weighting parameters, quantization values and deadzoning, lagrangianparameters, a number of references, reference order and handling,frame/field picture and macroblock parameters, deblocking parameters,inter block size, intra spatial prediction, and direct modes.
 23. Themethod as defined in claim 13, wherein said performing step performs thevideo analysis over multiple iterations, and said adapting step adaptsone of the encoding parameters and analysis statistics based on thepreviously generated analysis statistics.
 24. The method as defined inclaim 13, wherein each of the overlapping windows has a window size andan overlap size associated therewith, and said performing step comprisesthe step of adapting the window size and the overlap size based onpreviously generated analysis statistics.